Overview

Dataset statistics

Number of variables33
Number of observations18047
Missing cells44532
Missing cells (%)7.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.2 MiB
Average record size in memory1.4 KiB

Variable types

Numeric6
Boolean8
Categorical19

Warnings

Genetic Disorder is highly correlated with Disorder SubclassHigh correlation
Disorder Subclass is highly correlated with Genetic DisorderHigh correlation
Status is highly correlated with Autopsy shows birth defect (if applicable)High correlation
Autopsy shows birth defect (if applicable) is highly correlated with StatusHigh correlation
Disorder Subclass is highly correlated with Genetic DisorderHigh correlation
Genetic Disorder is highly correlated with Disorder SubclassHigh correlation
Status is highly correlated with Autopsy shows birth defect (if applicable)High correlation
Autopsy shows birth defect (if applicable) is highly correlated with StatusHigh correlation
Patient Age has 1060 (5.9%) missing values Missing
Inherited from father has 220 (1.2%) missing values Missing
Maternal gene has 2071 (11.5%) missing values Missing
Mother's age has 4457 (24.7%) missing values Missing
Father's age has 4418 (24.5%) missing values Missing
Respiratory Rate (breaths/min) has 1570 (8.7%) missing values Missing
Heart Rate (rates/min has 1528 (8.5%) missing values Missing
Follow-up has 1575 (8.7%) missing values Missing
Gender has 1573 (8.7%) missing values Missing
Birth asphyxia has 1552 (8.6%) missing values Missing
Autopsy shows birth defect (if applicable) has 757 (4.2%) missing values Missing
Folic acid details (peri-conceptional) has 1564 (8.7%) missing values Missing
H/O serious maternal illness has 1552 (8.6%) missing values Missing
H/O radiation exposure (x-ray) has 1584 (8.8%) missing values Missing
H/O substance abuse has 1632 (9.0%) missing values Missing
Assisted conception IVF/ART has 1590 (8.8%) missing values Missing
History of anomalies in previous pregnancies has 1614 (8.9%) missing values Missing
No. of previous abortion has 1546 (8.6%) missing values Missing
Birth defects has 1565 (8.7%) missing values Missing
White Blood cell count (thousand per microliter) has 1607 (8.9%) missing values Missing
Blood test result has 1564 (8.7%) missing values Missing
Symptom 1 has 1578 (8.7%) missing values Missing
Symptom 2 has 1646 (9.1%) missing values Missing
Symptom 3 has 1530 (8.5%) missing values Missing
Symptom 4 has 1566 (8.7%) missing values Missing
Symptom 5 has 1613 (8.9%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
Blood cell count (mcL) has unique values Unique
Patient Age has 1152 (6.4%) zeros Zeros

Reproduction

Analysis started2021-09-25 19:36:20.677162
Analysis finished2021-09-25 19:36:48.761131
Duration28.08 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct18047
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11062.08367
Minimum0
Maximum22082
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size141.1 KiB
2021-09-25T21:36:48.920816image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1099.3
Q15557.5
median11082
Q316577.5
95-th percentile20991.7
Maximum22082
Range22082
Interquartile range (IQR)11020

Descriptive statistics

Standard deviation6370.599895
Coefficient of variation (CV)0.5758951102
Kurtosis-1.19649644
Mean11062.08367
Median Absolute Deviation (MAD)5512
Skewness-0.003791366016
Sum199637424
Variance40584543.02
MonotonicityStrictly increasing
2021-09-25T21:36:49.108305image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
34191
 
< 0.1%
198111
 
< 0.1%
177621
 
< 0.1%
218561
 
< 0.1%
116151
 
< 0.1%
95661
 
< 0.1%
157091
 
< 0.1%
13701
 
< 0.1%
75291
 
< 0.1%
Other values (18037)18037
99.9%
ValueCountFrequency (%)
01
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
220821
< 0.1%
220801
< 0.1%
220791
< 0.1%
220781
< 0.1%
220771
< 0.1%
220761
< 0.1%
220751
< 0.1%
220741
< 0.1%
220721
< 0.1%
220711
< 0.1%

Patient Age
Real number (ℝ≥0)

MISSING
ZEROS

Distinct15
Distinct (%)0.1%
Missing1060
Missing (%)5.9%
Infinite0
Infinite (%)0.0%
Mean6.948784365
Minimum0
Maximum14
Zeros1152
Zeros (%)6.4%
Negative0
Negative (%)0.0%
Memory size141.1 KiB
2021-09-25T21:36:49.282765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median7
Q311
95-th percentile14
Maximum14
Range14
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.314394773
Coefficient of variation (CV)0.6208848263
Kurtosis-1.208603216
Mean6.948784365
Median Absolute Deviation (MAD)4
Skewness0.0173979278
Sum118039
Variance18.61400226
MonotonicityNot monotonic
2021-09-25T21:36:49.417233image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
41170
 
6.5%
51168
 
6.5%
91166
 
6.5%
121156
 
6.4%
01152
 
6.4%
21152
 
6.4%
31140
 
6.3%
71133
 
6.3%
61127
 
6.2%
131125
 
6.2%
Other values (5)5498
30.5%
ValueCountFrequency (%)
01152
6.4%
11124
6.2%
21152
6.4%
31140
6.3%
41170
6.5%
51168
6.5%
61127
6.2%
71133
6.3%
81109
6.1%
91166
6.5%
ValueCountFrequency (%)
141094
6.1%
131125
6.2%
121156
6.4%
111089
6.0%
101082
6.0%
91166
6.5%
81109
6.1%
71133
6.3%
61127
6.2%
51168
6.5%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.7 KiB
True
10743 
False
7304 
ValueCountFrequency (%)
True10743
59.5%
False7304
40.5%
2021-09-25T21:36:49.666745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Inherited from father
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing220
Missing (%)1.2%
Memory size35.4 KiB
False
10773 
True
7054 
(Missing)
 
220
ValueCountFrequency (%)
False10773
59.7%
True7054
39.1%
(Missing)220
 
1.2%
2021-09-25T21:36:49.747910image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Maternal gene
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing2071
Missing (%)11.5%
Memory size35.4 KiB
True
8803 
False
7173 
(Missing)
2071 
ValueCountFrequency (%)
True8803
48.8%
False7173
39.7%
(Missing)2071
 
11.5%
2021-09-25T21:36:49.829221image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.7 KiB
False
10239 
True
7808 
ValueCountFrequency (%)
False10239
56.7%
True7808
43.3%
2021-09-25T21:36:49.909546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Blood cell count (mcL)
Real number (ℝ≥0)

UNIQUE

Distinct18047
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.899198041
Minimum4.146229815
Maximum5.60982897
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.1 KiB
2021-09-25T21:36:50.068122image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum4.146229815
5-th percentile4.571443384
Q14.76419924
median4.9003065
Q35.033653581
95-th percentile5.226512704
Maximum5.60982897
Range1.463599155
Interquartile range (IQR)0.2694543411

Descriptive statistics

Standard deviation0.1990609098
Coefficient of variation (CV)0.04063132539
Kurtosis-0.05166438656
Mean4.899198041
Median Absolute Deviation (MAD)0.1346641969
Skewness0.004333523535
Sum88415.82705
Variance0.0396252458
MonotonicityNot monotonic
2021-09-25T21:36:50.275568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.1215113381
 
< 0.1%
4.8172338041
 
< 0.1%
5.2670730781
 
< 0.1%
4.687227821
 
< 0.1%
4.781270521
 
< 0.1%
5.1105353811
 
< 0.1%
5.0330668711
 
< 0.1%
4.8663708941
 
< 0.1%
4.6437019091
 
< 0.1%
4.83334741
 
< 0.1%
Other values (18037)18037
99.9%
ValueCountFrequency (%)
4.1462298151
< 0.1%
4.1858211051
< 0.1%
4.2034641641
< 0.1%
4.2155990361
< 0.1%
4.235726631
< 0.1%
4.2485653521
< 0.1%
4.2502124961
< 0.1%
4.2587987241
< 0.1%
4.2647957141
< 0.1%
4.2671193571
< 0.1%
ValueCountFrequency (%)
5.609828971
< 0.1%
5.5924507071
< 0.1%
5.5719664751
< 0.1%
5.5642121581
< 0.1%
5.5589325751
< 0.1%
5.5539515641
< 0.1%
5.5364037021
< 0.1%
5.5327822971
< 0.1%
5.5324004511
< 0.1%
5.5259476861
< 0.1%

Mother's age
Real number (ℝ≥0)

MISSING

Distinct34
Distinct (%)0.3%
Missing4457
Missing (%)24.7%
Infinite0
Infinite (%)0.0%
Mean34.57645327
Minimum18
Maximum51
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.1 KiB
2021-09-25T21:36:50.500966image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile19
Q126
median35
Q343
95-th percentile50
Maximum51
Range33
Interquartile range (IQR)17

Descriptive statistics

Standard deviation9.823005226
Coefficient of variation (CV)0.2840952237
Kurtosis-1.214144039
Mean34.57645327
Median Absolute Deviation (MAD)9
Skewness-0.007944714443
Sum469894
Variance96.49143168
MonotonicityNot monotonic
2021-09-25T21:36:50.687326image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
23449
 
2.5%
48442
 
2.4%
40439
 
2.4%
28434
 
2.4%
19429
 
2.4%
41427
 
2.4%
47425
 
2.4%
45421
 
2.3%
35416
 
2.3%
27410
 
2.3%
Other values (24)9298
51.5%
(Missing)4457
24.7%
ValueCountFrequency (%)
18368
2.0%
19429
2.4%
20368
2.0%
21407
2.3%
22391
2.2%
23449
2.5%
24396
2.2%
25356
2.0%
26382
2.1%
27410
2.3%
ValueCountFrequency (%)
51381
2.1%
50401
2.2%
49409
2.3%
48442
2.4%
47425
2.4%
46400
2.2%
45421
2.3%
44392
2.2%
43362
2.0%
42390
2.2%

Father's age
Real number (ℝ≥0)

MISSING

Distinct45
Distinct (%)0.3%
Missing4418
Missing (%)24.5%
Infinite0
Infinite (%)0.0%
Mean41.97255851
Minimum20
Maximum64
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.1 KiB
2021-09-25T21:36:50.897272image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile22
Q130
median42
Q353
95-th percentile62
Maximum64
Range44
Interquartile range (IQR)23

Descriptive statistics

Standard deviation13.0644407
Coefficient of variation (CV)0.3112614805
Kurtosis-1.221300349
Mean41.97255851
Median Absolute Deviation (MAD)11
Skewness-0.004125738126
Sum572044
Variance170.6796109
MonotonicityNot monotonic
2021-09-25T21:36:51.117343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
20354
 
2.0%
29345
 
1.9%
61334
 
1.9%
49331
 
1.8%
56325
 
1.8%
57324
 
1.8%
53319
 
1.8%
26319
 
1.8%
27318
 
1.8%
52317
 
1.8%
Other values (35)10343
57.3%
(Missing)4418
24.5%
ValueCountFrequency (%)
20354
2.0%
21293
1.6%
22289
1.6%
23294
1.6%
24300
1.7%
25288
1.6%
26319
1.8%
27318
1.8%
28304
1.7%
29345
1.9%
ValueCountFrequency (%)
64315
1.7%
63263
1.5%
62298
1.7%
61334
1.9%
60298
1.7%
59312
1.7%
58301
1.7%
57324
1.8%
56325
1.8%
55300
1.7%

Status
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Alive
9061 
Deceased
8986 

Length

Max length8
Median length5
Mean length6.493766277
Min length5

Characters and Unicode

Total characters117193
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlive
2nd rowAlive
3rd rowDeceased
4th rowAlive
5th rowDeceased

Common Values

ValueCountFrequency (%)
Alive9061
50.2%
Deceased8986
49.8%

Length

2021-09-25T21:36:51.506419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:51.616509image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
alive9061
50.2%
deceased8986
49.8%

Most occurring characters

ValueCountFrequency (%)
e36019
30.7%
A9061
 
7.7%
l9061
 
7.7%
i9061
 
7.7%
v9061
 
7.7%
D8986
 
7.7%
c8986
 
7.7%
a8986
 
7.7%
s8986
 
7.7%
d8986
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter99146
84.6%
Uppercase Letter18047
 
15.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e36019
36.3%
l9061
 
9.1%
i9061
 
9.1%
v9061
 
9.1%
c8986
 
9.1%
a8986
 
9.1%
s8986
 
9.1%
d8986
 
9.1%
Uppercase Letter
ValueCountFrequency (%)
A9061
50.2%
D8986
49.8%

Most occurring scripts

ValueCountFrequency (%)
Latin117193
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e36019
30.7%
A9061
 
7.7%
l9061
 
7.7%
i9061
 
7.7%
v9061
 
7.7%
D8986
 
7.7%
c8986
 
7.7%
a8986
 
7.7%
s8986
 
7.7%
d8986
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII117193
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e36019
30.7%
A9061
 
7.7%
l9061
 
7.7%
i9061
 
7.7%
v9061
 
7.7%
D8986
 
7.7%
c8986
 
7.7%
a8986
 
7.7%
s8986
 
7.7%
d8986
 
7.7%
Distinct2
Distinct (%)< 0.1%
Missing1570
Missing (%)8.7%
Memory size1.1 MiB
Normal (30-60)
8281 
Tachypnea
8196 

Length

Max length14
Median length14
Mean length11.51289677
Min length9

Characters and Unicode

Total characters189698
Distinct characters20
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNormal (30-60)
2nd rowNormal (30-60)
3rd rowTachypnea
4th rowTachypnea
5th rowNormal (30-60)

Common Values

ValueCountFrequency (%)
Normal (30-60)8281
45.9%
Tachypnea8196
45.4%
(Missing)1570
 
8.7%

Length

2021-09-25T21:36:51.909517image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:52.048551image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
30-608281
33.4%
normal8281
33.4%
tachypnea8196
33.1%

Most occurring characters

ValueCountFrequency (%)
a24673
 
13.0%
016562
 
8.7%
N8281
 
4.4%
o8281
 
4.4%
r8281
 
4.4%
m8281
 
4.4%
l8281
 
4.4%
8281
 
4.4%
(8281
 
4.4%
38281
 
4.4%
Other values (10)82215
43.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter106973
56.4%
Decimal Number33124
 
17.5%
Uppercase Letter16477
 
8.7%
Space Separator8281
 
4.4%
Open Punctuation8281
 
4.4%
Dash Punctuation8281
 
4.4%
Close Punctuation8281
 
4.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a24673
23.1%
o8281
 
7.7%
r8281
 
7.7%
m8281
 
7.7%
l8281
 
7.7%
c8196
 
7.7%
h8196
 
7.7%
y8196
 
7.7%
p8196
 
7.7%
n8196
 
7.7%
Decimal Number
ValueCountFrequency (%)
016562
50.0%
38281
25.0%
68281
25.0%
Uppercase Letter
ValueCountFrequency (%)
N8281
50.3%
T8196
49.7%
Space Separator
ValueCountFrequency (%)
8281
100.0%
Open Punctuation
ValueCountFrequency (%)
(8281
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8281
100.0%
Close Punctuation
ValueCountFrequency (%)
)8281
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin123450
65.1%
Common66248
34.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a24673
20.0%
N8281
 
6.7%
o8281
 
6.7%
r8281
 
6.7%
m8281
 
6.7%
l8281
 
6.7%
T8196
 
6.6%
c8196
 
6.6%
h8196
 
6.6%
y8196
 
6.6%
Other values (3)24588
19.9%
Common
ValueCountFrequency (%)
016562
25.0%
8281
12.5%
(8281
12.5%
38281
12.5%
-8281
12.5%
68281
12.5%
)8281
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII189698
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a24673
 
13.0%
016562
 
8.7%
N8281
 
4.4%
o8281
 
4.4%
r8281
 
4.4%
m8281
 
4.4%
l8281
 
4.4%
8281
 
4.4%
(8281
 
4.4%
38281
 
4.4%
Other values (10)82215
43.3%

Heart Rate (rates/min
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1528
Missing (%)8.5%
Memory size1.1 MiB
Normal
8396 
Tachycardia
8123 

Length

Max length11
Median length6
Mean length8.45868394
Min length6

Characters and Unicode

Total characters139729
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNormal
2nd rowTachycardia
3rd rowNormal
4th rowTachycardia
5th rowNormal

Common Values

ValueCountFrequency (%)
Normal8396
46.5%
Tachycardia8123
45.0%
(Missing)1528
 
8.5%

Length

2021-09-25T21:36:52.322819image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:52.447825image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
normal8396
50.8%
tachycardia8123
49.2%

Most occurring characters

ValueCountFrequency (%)
a32765
23.4%
r16519
11.8%
c16246
11.6%
N8396
 
6.0%
o8396
 
6.0%
m8396
 
6.0%
l8396
 
6.0%
T8123
 
5.8%
h8123
 
5.8%
y8123
 
5.8%
Other values (2)16246
11.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter123210
88.2%
Uppercase Letter16519
 
11.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a32765
26.6%
r16519
13.4%
c16246
13.2%
o8396
 
6.8%
m8396
 
6.8%
l8396
 
6.8%
h8123
 
6.6%
y8123
 
6.6%
d8123
 
6.6%
i8123
 
6.6%
Uppercase Letter
ValueCountFrequency (%)
N8396
50.8%
T8123
49.2%

Most occurring scripts

ValueCountFrequency (%)
Latin139729
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a32765
23.4%
r16519
11.8%
c16246
11.6%
N8396
 
6.0%
o8396
 
6.0%
m8396
 
6.0%
l8396
 
6.0%
T8123
 
5.8%
h8123
 
5.8%
y8123
 
5.8%
Other values (2)16246
11.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII139729
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a32765
23.4%
r16519
11.8%
c16246
11.6%
N8396
 
6.0%
o8396
 
6.0%
m8396
 
6.0%
l8396
 
6.0%
T8123
 
5.8%
h8123
 
5.8%
y8123
 
5.8%
Other values (2)16246
11.6%

Follow-up
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1575
Missing (%)8.7%
Memory size1022.5 KiB
Low
8322 
High
8150 

Length

Max length4
Median length3
Mean length3.494779019
Min length3

Characters and Unicode

Total characters57566
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigh
2nd rowLow
3rd rowHigh
4th rowLow
5th rowLow

Common Values

ValueCountFrequency (%)
Low8322
46.1%
High8150
45.2%
(Missing)1575
 
8.7%

Length

2021-09-25T21:36:52.750568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:52.897511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
low8322
50.5%
high8150
49.5%

Most occurring characters

ValueCountFrequency (%)
L8322
14.5%
o8322
14.5%
w8322
14.5%
H8150
14.2%
i8150
14.2%
g8150
14.2%
h8150
14.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter41094
71.4%
Uppercase Letter16472
28.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o8322
20.3%
w8322
20.3%
i8150
19.8%
g8150
19.8%
h8150
19.8%
Uppercase Letter
ValueCountFrequency (%)
L8322
50.5%
H8150
49.5%

Most occurring scripts

ValueCountFrequency (%)
Latin57566
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L8322
14.5%
o8322
14.5%
w8322
14.5%
H8150
14.2%
i8150
14.2%
g8150
14.2%
h8150
14.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII57566
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L8322
14.5%
o8322
14.5%
w8322
14.5%
H8150
14.2%
i8150
14.2%
g8150
14.2%
h8150
14.2%

Gender
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing1573
Missing (%)8.7%
Memory size1.0 MiB
Male
5519 
Ambiguous
5509 
Female
5446 

Length

Max length9
Median length6
Mean length6.333191696
Min length4

Characters and Unicode

Total characters104333
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male5519
30.6%
Ambiguous5509
30.5%
Female5446
30.2%
(Missing)1573
 
8.7%

Length

2021-09-25T21:36:53.246215image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:53.355587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
male5519
33.5%
ambiguous5509
33.4%
female5446
33.1%

Most occurring characters

ValueCountFrequency (%)
e16411
15.7%
u11018
10.6%
a10965
10.5%
l10965
10.5%
m10955
10.5%
M5519
 
5.3%
A5509
 
5.3%
b5509
 
5.3%
i5509
 
5.3%
g5509
 
5.3%
Other values (3)16464
15.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter87859
84.2%
Uppercase Letter16474
 
15.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e16411
18.7%
u11018
12.5%
a10965
12.5%
l10965
12.5%
m10955
12.5%
b5509
 
6.3%
i5509
 
6.3%
g5509
 
6.3%
o5509
 
6.3%
s5509
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
M5519
33.5%
A5509
33.4%
F5446
33.1%

Most occurring scripts

ValueCountFrequency (%)
Latin104333
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e16411
15.7%
u11018
10.6%
a10965
10.5%
l10965
10.5%
m10955
10.5%
M5519
 
5.3%
A5509
 
5.3%
b5509
 
5.3%
i5509
 
5.3%
g5509
 
5.3%
Other values (3)16464
15.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII104333
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e16411
15.7%
u11018
10.6%
a10965
10.5%
l10965
10.5%
m10955
10.5%
M5519
 
5.3%
A5509
 
5.3%
b5509
 
5.3%
i5509
 
5.3%
g5509
 
5.3%
Other values (3)16464
15.8%

Birth asphyxia
Categorical

MISSING

Distinct4
Distinct (%)< 0.1%
Missing1552
Missing (%)8.6%
Memory size1.1 MiB
Yes
4248 
Not available
4120 
No record
4112 
No
4015 

Length

Max length13
Median length3
Mean length6.750045468
Min length2

Characters and Unicode

Total characters111342
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo record
2nd rowNot available
3rd rowNot available
4th rowNot available
5th rowNo record

Common Values

ValueCountFrequency (%)
Yes4248
23.5%
Not available4120
22.8%
No record4112
22.8%
No4015
22.2%
(Missing)1552
 
8.6%

Length

2021-09-25T21:36:53.684923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:54.036724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
no8127
32.9%
yes4248
17.2%
available4120
16.7%
not4120
16.7%
record4112
16.6%

Most occurring characters

ValueCountFrequency (%)
o16359
14.7%
e12480
11.2%
a12360
11.1%
N12247
11.0%
l8240
7.4%
8232
 
7.4%
r8224
 
7.4%
Y4248
 
3.8%
s4248
 
3.8%
t4120
 
3.7%
Other values (5)20584
18.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter86615
77.8%
Uppercase Letter16495
 
14.8%
Space Separator8232
 
7.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o16359
18.9%
e12480
14.4%
a12360
14.3%
l8240
9.5%
r8224
9.5%
s4248
 
4.9%
t4120
 
4.8%
v4120
 
4.8%
i4120
 
4.8%
b4120
 
4.8%
Other values (2)8224
9.5%
Uppercase Letter
ValueCountFrequency (%)
N12247
74.2%
Y4248
 
25.8%
Space Separator
ValueCountFrequency (%)
8232
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin103110
92.6%
Common8232
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o16359
15.9%
e12480
12.1%
a12360
12.0%
N12247
11.9%
l8240
8.0%
r8224
8.0%
Y4248
 
4.1%
s4248
 
4.1%
t4120
 
4.0%
v4120
 
4.0%
Other values (4)16464
16.0%
Common
ValueCountFrequency (%)
8232
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII111342
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o16359
14.7%
e12480
11.2%
a12360
11.1%
N12247
11.0%
l8240
7.4%
8232
 
7.4%
r8224
 
7.4%
Y4248
 
3.8%
s4248
 
3.8%
t4120
 
3.7%
Other values (5)20584
18.5%

Autopsy shows birth defect (if applicable)
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing757
Missing (%)4.2%
Memory size1.1 MiB
Not applicable
9061 
None
2805 
Yes
2781 
No
2643 

Length

Max length14
Median length14
Mean length8.774031232
Min length2

Characters and Unicode

Total characters151703
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot applicable
2nd rowNot applicable
3rd rowNo
4th rowNot applicable
5th rowNone

Common Values

ValueCountFrequency (%)
Not applicable9061
50.2%
None2805
 
15.5%
Yes2781
 
15.4%
No2643
 
14.6%
(Missing)757
 
4.2%

Length

2021-09-25T21:36:54.359416image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:54.453115image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
applicable9061
34.4%
not9061
34.4%
none2805
 
10.6%
yes2781
 
10.6%
no2643
 
10.0%

Most occurring characters

ValueCountFrequency (%)
a18122
11.9%
p18122
11.9%
l18122
11.9%
e14647
9.7%
N14509
9.6%
o14509
9.6%
t9061
6.0%
9061
6.0%
i9061
6.0%
c9061
6.0%
Other values (4)17428
11.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter125352
82.6%
Uppercase Letter17290
 
11.4%
Space Separator9061
 
6.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a18122
14.5%
p18122
14.5%
l18122
14.5%
e14647
11.7%
o14509
11.6%
t9061
7.2%
i9061
7.2%
c9061
7.2%
b9061
7.2%
n2805
 
2.2%
Uppercase Letter
ValueCountFrequency (%)
N14509
83.9%
Y2781
 
16.1%
Space Separator
ValueCountFrequency (%)
9061
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin142642
94.0%
Common9061
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a18122
12.7%
p18122
12.7%
l18122
12.7%
e14647
10.3%
N14509
10.2%
o14509
10.2%
t9061
6.4%
i9061
6.4%
c9061
6.4%
b9061
6.4%
Other values (3)8367
5.9%
Common
ValueCountFrequency (%)
9061
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII151703
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a18122
11.9%
p18122
11.9%
l18122
11.9%
e14647
9.7%
N14509
9.6%
o14509
9.6%
t9061
6.0%
9061
6.0%
i9061
6.0%
c9061
6.0%
Other values (4)17428
11.5%
Distinct2
Distinct (%)< 0.1%
Missing1564
Missing (%)8.7%
Memory size35.4 KiB
True
8336 
False
8147 
(Missing)
1564 
ValueCountFrequency (%)
True8336
46.2%
False8147
45.1%
(Missing)1564
 
8.7%
2021-09-25T21:36:54.575542image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing1552
Missing (%)8.6%
Memory size35.4 KiB
False
8292 
True
8203 
(Missing)
1552 
ValueCountFrequency (%)
False8292
45.9%
True8203
45.5%
(Missing)1552
 
8.6%
2021-09-25T21:36:54.649548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Distinct4
Distinct (%)< 0.1%
Missing1584
Missing (%)8.8%
Memory size1.0 MiB
Not applicable
4156 
No
4143 
Yes
4130 
-
4034 

Length

Max length14
Median length3
Mean length5.035169775
Min length1

Characters and Unicode

Total characters82894
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowYes
3rd row-
4th row-
5th rowNo

Common Values

ValueCountFrequency (%)
Not applicable4156
23.0%
No4143
23.0%
Yes4130
22.9%
-4034
22.4%
(Missing)1584
 
8.8%

Length

2021-09-25T21:36:54.983358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:55.131354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
applicable4156
20.2%
not4156
20.2%
no4143
20.1%
yes4130
20.0%
4034
19.6%

Most occurring characters

ValueCountFrequency (%)
a8312
10.0%
p8312
10.0%
l8312
10.0%
N8299
10.0%
o8299
10.0%
e8286
10.0%
t4156
 
5.0%
4156
 
5.0%
i4156
 
5.0%
c4156
 
5.0%
Other values (4)16450
19.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter62275
75.1%
Uppercase Letter12429
 
15.0%
Space Separator4156
 
5.0%
Dash Punctuation4034
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a8312
13.3%
p8312
13.3%
l8312
13.3%
o8299
13.3%
e8286
13.3%
t4156
6.7%
i4156
6.7%
c4156
6.7%
b4156
6.7%
s4130
6.6%
Uppercase Letter
ValueCountFrequency (%)
N8299
66.8%
Y4130
33.2%
Dash Punctuation
ValueCountFrequency (%)
-4034
100.0%
Space Separator
ValueCountFrequency (%)
4156
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin74704
90.1%
Common8190
 
9.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a8312
11.1%
p8312
11.1%
l8312
11.1%
N8299
11.1%
o8299
11.1%
e8286
11.1%
t4156
5.6%
i4156
5.6%
c4156
5.6%
b4156
5.6%
Other values (2)8260
11.1%
Common
ValueCountFrequency (%)
4156
50.7%
-4034
49.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII82894
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a8312
10.0%
p8312
10.0%
l8312
10.0%
N8299
10.0%
o8299
10.0%
e8286
10.0%
t4156
 
5.0%
4156
 
5.0%
i4156
 
5.0%
c4156
 
5.0%
Other values (4)16450
19.8%

H/O substance abuse
Categorical

MISSING

Distinct4
Distinct (%)< 0.1%
Missing1632
Missing (%)9.0%
Memory size1.0 MiB
No
4170 
-
4130 
Yes
4125 
Not applicable
3990 

Length

Max length14
Median length2
Mean length4.91653975
Min length1

Characters and Unicode

Total characters80705
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNot applicable
3rd rowNot applicable
4th rowNo
5th rowNot applicable

Common Values

ValueCountFrequency (%)
No4170
23.1%
-4130
22.9%
Yes4125
22.9%
Not applicable3990
22.1%
(Missing)1632
 
9.0%

Length

2021-09-25T21:36:55.534453image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:55.671267image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
no4170
20.4%
4130
20.2%
yes4125
20.2%
applicable3990
19.6%
not3990
19.6%

Most occurring characters

ValueCountFrequency (%)
N8160
10.1%
o8160
10.1%
e8115
10.1%
a7980
9.9%
p7980
9.9%
l7980
9.9%
-4130
 
5.1%
Y4125
 
5.1%
s4125
 
5.1%
t3990
 
4.9%
Other values (4)15960
19.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter60300
74.7%
Uppercase Letter12285
 
15.2%
Dash Punctuation4130
 
5.1%
Space Separator3990
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o8160
13.5%
e8115
13.5%
a7980
13.2%
p7980
13.2%
l7980
13.2%
s4125
6.8%
t3990
6.6%
i3990
6.6%
c3990
6.6%
b3990
6.6%
Uppercase Letter
ValueCountFrequency (%)
N8160
66.4%
Y4125
33.6%
Space Separator
ValueCountFrequency (%)
3990
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin72585
89.9%
Common8120
 
10.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N8160
11.2%
o8160
11.2%
e8115
11.2%
a7980
11.0%
p7980
11.0%
l7980
11.0%
Y4125
5.7%
s4125
5.7%
t3990
5.5%
i3990
5.5%
Other values (2)7980
11.0%
Common
ValueCountFrequency (%)
-4130
50.9%
3990
49.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII80705
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N8160
10.1%
o8160
10.1%
e8115
10.1%
a7980
9.9%
p7980
9.9%
l7980
9.9%
-4130
 
5.1%
Y4125
 
5.1%
s4125
 
5.1%
t3990
 
4.9%
Other values (4)15960
19.8%
Distinct2
Distinct (%)< 0.1%
Missing1590
Missing (%)8.8%
Memory size35.4 KiB
True
8274 
False
8183 
(Missing)
1590 
ValueCountFrequency (%)
True8274
45.8%
False8183
45.3%
(Missing)1590
 
8.8%
2021-09-25T21:36:55.786515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing1614
Missing (%)8.9%
Memory size35.4 KiB
True
8285 
False
8148 
(Missing)
1614 
ValueCountFrequency (%)
True8285
45.9%
False8148
45.1%
(Missing)1614
 
8.9%
2021-09-25T21:36:55.845747image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

No. of previous abortion
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing1546
Missing (%)8.6%
Memory size1.0 MiB
2.0
3396 
1.0
3282 
4.0
3281 
0.0
3277 
3.0
3265 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49503
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row1.0
3rd row4.0
4th row0.0
5th row3.0

Common Values

ValueCountFrequency (%)
2.03396
18.8%
1.03282
18.2%
4.03281
18.2%
0.03277
18.2%
3.03265
18.1%
(Missing)1546
8.6%

Length

2021-09-25T21:36:56.226778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:56.358283image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
2.03396
20.6%
1.03282
19.9%
4.03281
19.9%
0.03277
19.9%
3.03265
19.8%

Most occurring characters

ValueCountFrequency (%)
019778
40.0%
.16501
33.3%
23396
 
6.9%
13282
 
6.6%
43281
 
6.6%
33265
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33002
66.7%
Other Punctuation16501
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
019778
59.9%
23396
 
10.3%
13282
 
9.9%
43281
 
9.9%
33265
 
9.9%
Other Punctuation
ValueCountFrequency (%)
.16501
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49503
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
019778
40.0%
.16501
33.3%
23396
 
6.9%
13282
 
6.6%
43281
 
6.6%
33265
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII49503
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
019778
40.0%
.16501
33.3%
23396
 
6.9%
13282
 
6.6%
43281
 
6.6%
33265
 
6.6%

Birth defects
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1565
Missing (%)8.7%
Memory size1.1 MiB
Multiple
8242 
Singular
8240 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters131856
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSingular
2nd rowSingular
3rd rowMultiple
4th rowMultiple
5th rowMultiple

Common Values

ValueCountFrequency (%)
Multiple8242
45.7%
Singular8240
45.7%
(Missing)1565
 
8.7%

Length

2021-09-25T21:36:56.748573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:56.860788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
multiple8242
50.0%
singular8240
50.0%

Most occurring characters

ValueCountFrequency (%)
l24724
18.8%
i16482
12.5%
u16482
12.5%
M8242
 
6.3%
t8242
 
6.3%
p8242
 
6.3%
e8242
 
6.3%
S8240
 
6.2%
n8240
 
6.2%
g8240
 
6.2%
Other values (2)16480
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter115374
87.5%
Uppercase Letter16482
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l24724
21.4%
i16482
14.3%
u16482
14.3%
t8242
 
7.1%
p8242
 
7.1%
e8242
 
7.1%
n8240
 
7.1%
g8240
 
7.1%
a8240
 
7.1%
r8240
 
7.1%
Uppercase Letter
ValueCountFrequency (%)
M8242
50.0%
S8240
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin131856
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
l24724
18.8%
i16482
12.5%
u16482
12.5%
M8242
 
6.3%
t8242
 
6.3%
p8242
 
6.3%
e8242
 
6.3%
S8240
 
6.2%
n8240
 
6.2%
g8240
 
6.2%
Other values (2)16480
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII131856
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l24724
18.8%
i16482
12.5%
u16482
12.5%
M8242
 
6.3%
t8242
 
6.3%
p8242
 
6.3%
e8242
 
6.3%
S8240
 
6.2%
n8240
 
6.2%
g8240
 
6.2%
Other values (2)16480
12.5%
Distinct14250
Distinct (%)86.7%
Missing1607
Missing (%)8.9%
Infinite0
Infinite (%)0.0%
Mean7.475739873
Minimum3
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.1 KiB
2021-09-25T21:36:56.998541image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q15.422143398
median7.470548919
Q39.517470053
95-th percentile12
Maximum12
Range9
Interquartile range (IQR)4.095326655

Descriptive statistics

Standard deviation2.651119965
Coefficient of variation (CV)0.3546297772
Kurtosis-0.9713438608
Mean7.475739873
Median Absolute Deviation (MAD)2.047988445
Skewness0.008823557343
Sum122901.1635
Variance7.028437071
MonotonicityNot monotonic
2021-09-25T21:36:57.185030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31114
 
6.2%
121078
 
6.0%
9.3417033011
 
< 0.1%
9.6722976011
 
< 0.1%
7.9115259711
 
< 0.1%
5.0119050941
 
< 0.1%
5.4366223681
 
< 0.1%
6.7248892561
 
< 0.1%
3.9725194391
 
< 0.1%
5.3574751391
 
< 0.1%
Other values (14240)14240
78.9%
(Missing)1607
 
8.9%
ValueCountFrequency (%)
31114
6.2%
3.0007361311
 
< 0.1%
3.0014569881
 
< 0.1%
3.0036658541
 
< 0.1%
3.0038565481
 
< 0.1%
3.005595471
 
< 0.1%
3.0056215391
 
< 0.1%
3.0059675251
 
< 0.1%
3.0063149051
 
< 0.1%
3.0102202051
 
< 0.1%
ValueCountFrequency (%)
121078
6.0%
11.999857471
 
< 0.1%
11.999652981
 
< 0.1%
11.999292931
 
< 0.1%
11.996706831
 
< 0.1%
11.996677631
 
< 0.1%
11.996100311
 
< 0.1%
11.995467661
 
< 0.1%
11.995346471
 
< 0.1%
11.995323181
 
< 0.1%

Blood test result
Categorical

MISSING

Distinct4
Distinct (%)< 0.1%
Missing1564
Missing (%)8.7%
Memory size1.1 MiB
slightly abnormal
4257 
inconclusive
4109 
normal
4091 
abnormal
4026 

Length

Max length17
Median length12
Mean length10.82515319
Min length6

Characters and Unicode

Total characters178431
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownormal
2nd rowinconclusive
3rd rownormal
4th rownormal
5th rowinconclusive

Common Values

ValueCountFrequency (%)
slightly abnormal4257
23.6%
inconclusive4109
22.8%
normal4091
22.7%
abnormal4026
22.3%
(Missing)1564
 
8.7%

Length

2021-09-25T21:36:57.647630image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:57.831317image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
abnormal8283
39.9%
slightly4257
20.5%
inconclusive4109
19.8%
normal4091
19.7%

Most occurring characters

ValueCountFrequency (%)
l24997
14.0%
a20657
11.6%
n20592
11.5%
o16483
9.2%
i12475
 
7.0%
r12374
 
6.9%
m12374
 
6.9%
s8366
 
4.7%
b8283
 
4.6%
c8218
 
4.6%
Other values (8)33612
18.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter174174
97.6%
Space Separator4257
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l24997
14.4%
a20657
11.9%
n20592
11.8%
o16483
9.5%
i12475
7.2%
r12374
7.1%
m12374
7.1%
s8366
 
4.8%
b8283
 
4.8%
c8218
 
4.7%
Other values (7)29355
16.9%
Space Separator
ValueCountFrequency (%)
4257
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin174174
97.6%
Common4257
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
l24997
14.4%
a20657
11.9%
n20592
11.8%
o16483
9.5%
i12475
7.2%
r12374
7.1%
m12374
7.1%
s8366
 
4.8%
b8283
 
4.8%
c8218
 
4.7%
Other values (7)29355
16.9%
Common
ValueCountFrequency (%)
4257
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII178431
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l24997
14.0%
a20657
11.6%
n20592
11.5%
o16483
9.2%
i12475
 
7.0%
r12374
 
6.9%
m12374
 
6.9%
s8366
 
4.7%
b8283
 
4.6%
c8218
 
4.6%
Other values (8)33612
18.8%

Symptom 1
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1578
Missing (%)8.7%
Memory size1.0 MiB
1.0
9748 
0.0
6721 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49407
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row0.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.09748
54.0%
0.06721
37.2%
(Missing)1578
 
8.7%

Length

2021-09-25T21:36:58.332133image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:58.496693image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.09748
59.2%
0.06721
40.8%

Most occurring characters

ValueCountFrequency (%)
023190
46.9%
.16469
33.3%
19748
19.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32938
66.7%
Other Punctuation16469
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
023190
70.4%
19748
29.6%
Other Punctuation
ValueCountFrequency (%)
.16469
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49407
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
023190
46.9%
.16469
33.3%
19748
19.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII49407
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
023190
46.9%
.16469
33.3%
19748
19.7%

Symptom 2
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1646
Missing (%)9.1%
Memory size1.0 MiB
1.0
9055 
0.0
7346 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49203
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.09055
50.2%
0.07346
40.7%
(Missing)1646
 
9.1%

Length

2021-09-25T21:36:58.861994image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:58.982897image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.09055
55.2%
0.07346
44.8%

Most occurring characters

ValueCountFrequency (%)
023747
48.3%
.16401
33.3%
19055
 
18.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32802
66.7%
Other Punctuation16401
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
023747
72.4%
19055
 
27.6%
Other Punctuation
ValueCountFrequency (%)
.16401
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49203
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
023747
48.3%
.16401
33.3%
19055
 
18.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII49203
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
023747
48.3%
.16401
33.3%
19055
 
18.4%

Symptom 3
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1530
Missing (%)8.5%
Memory size1.0 MiB
1.0
8882 
0.0
7635 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49551
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.08882
49.2%
0.07635
42.3%
(Missing)1530
 
8.5%

Length

2021-09-25T21:36:59.286765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:59.438511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.08882
53.8%
0.07635
46.2%

Most occurring characters

ValueCountFrequency (%)
024152
48.7%
.16517
33.3%
18882
 
17.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33034
66.7%
Other Punctuation16517
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
024152
73.1%
18882
 
26.9%
Other Punctuation
ValueCountFrequency (%)
.16517
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49551
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
024152
48.7%
.16517
33.3%
18882
 
17.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII49551
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
024152
48.7%
.16517
33.3%
18882
 
17.9%

Symptom 4
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1566
Missing (%)8.7%
Memory size1.0 MiB
0.0
8257 
1.0
8224 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49443
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.08257
45.8%
1.08224
45.6%
(Missing)1566
 
8.7%

Length

2021-09-25T21:36:59.753438image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:36:59.865550image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.08257
50.1%
1.08224
49.9%

Most occurring characters

ValueCountFrequency (%)
024738
50.0%
.16481
33.3%
18224
 
16.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32962
66.7%
Other Punctuation16481
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
024738
75.1%
18224
 
24.9%
Other Punctuation
ValueCountFrequency (%)
.16481
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49443
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
024738
50.0%
.16481
33.3%
18224
 
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII49443
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
024738
50.0%
.16481
33.3%
18224
 
16.6%

Symptom 5
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing1613
Missing (%)8.9%
Memory size1.0 MiB
0.0
8803 
1.0
7631 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49302
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.08803
48.8%
1.07631
42.3%
(Missing)1613
 
8.9%

Length

2021-09-25T21:37:00.162352image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:37:00.293534image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.08803
53.6%
1.07631
46.4%

Most occurring characters

ValueCountFrequency (%)
025237
51.2%
.16434
33.3%
17631
 
15.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32868
66.7%
Other Punctuation16434
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
025237
76.8%
17631
 
23.2%
Other Punctuation
ValueCountFrequency (%)
.16434
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
025237
51.2%
.16434
33.3%
17631
 
15.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII49302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
025237
51.2%
.16434
33.3%
17631
 
15.5%

Genetic Disorder
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
Mitochondrial genetic inheritance disorders
9241 
Single-gene inheritance diseases
6929 
Multifactorial genetic inheritance disorders
1877 

Length

Max length44
Median length43
Mean length38.88064498
Min length32

Characters and Unicode

Total characters701679
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMitochondrial genetic inheritance disorders
2nd rowMultifactorial genetic inheritance disorders
3rd rowMitochondrial genetic inheritance disorders
4th rowMultifactorial genetic inheritance disorders
5th rowSingle-gene inheritance diseases

Common Values

ValueCountFrequency (%)
Mitochondrial genetic inheritance disorders9241
51.2%
Single-gene inheritance diseases6929
38.4%
Multifactorial genetic inheritance disorders1877
 
10.4%

Length

2021-09-25T21:37:00.601773image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:37:00.730821image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
inheritance18047
27.7%
disorders11118
17.0%
genetic11118
17.0%
mitochondrial9241
14.2%
single-gene6929
 
10.6%
diseases6929
 
10.6%
multifactorial1877
 
2.9%

Most occurring characters

ValueCountFrequency (%)
e104093
14.8%
i94424
13.5%
n70311
10.0%
r51401
 
7.3%
47212
 
6.7%
s43023
 
6.1%
t42160
 
6.0%
c40283
 
5.7%
d38406
 
5.5%
a37971
 
5.4%
Other values (9)132395
18.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter629491
89.7%
Space Separator47212
 
6.7%
Uppercase Letter18047
 
2.6%
Dash Punctuation6929
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e104093
16.5%
i94424
15.0%
n70311
11.2%
r51401
8.2%
s43023
6.8%
t42160
6.7%
c40283
 
6.4%
d38406
 
6.1%
a37971
 
6.0%
o31477
 
5.0%
Other values (5)75942
12.1%
Uppercase Letter
ValueCountFrequency (%)
M11118
61.6%
S6929
38.4%
Space Separator
ValueCountFrequency (%)
47212
100.0%
Dash Punctuation
ValueCountFrequency (%)
-6929
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin647538
92.3%
Common54141
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e104093
16.1%
i94424
14.6%
n70311
10.9%
r51401
7.9%
s43023
6.6%
t42160
6.5%
c40283
 
6.2%
d38406
 
5.9%
a37971
 
5.9%
o31477
 
4.9%
Other values (7)93989
14.5%
Common
ValueCountFrequency (%)
47212
87.2%
-6929
 
12.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII701679
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e104093
14.8%
i94424
13.5%
n70311
10.0%
r51401
 
7.3%
47212
 
6.7%
s43023
 
6.1%
t42160
 
6.0%
c40283
 
5.7%
d38406
 
5.5%
a37971
 
5.4%
Other values (9)132395
18.9%

Disorder Subclass
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Leigh syndrome
4683 
Mitochondrial myopathy
3971 
Cystic fibrosis
3145 
Tay-Sachs
2556 
Diabetes
1653 
Other values (4)
2039 

Length

Max length35
Median length14
Mean length15.36549011
Min length6

Characters and Unicode

Total characters277301
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLeber's hereditary optic neuropathy
2nd rowDiabetes
3rd rowLeigh syndrome
4th rowCancer
5th rowCystic fibrosis

Common Values

ValueCountFrequency (%)
Leigh syndrome4683
25.9%
Mitochondrial myopathy3971
22.0%
Cystic fibrosis3145
17.4%
Tay-Sachs2556
14.2%
Diabetes1653
 
9.2%
Hemochromatosis1228
 
6.8%
Leber's hereditary optic neuropathy587
 
3.3%
Alzheimer's133
 
0.7%
Cancer91
 
0.5%

Length

2021-09-25T21:37:01.099967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T21:37:01.219697image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
leigh4683
14.8%
syndrome4683
14.8%
mitochondrial3971
12.6%
myopathy3971
12.6%
cystic3145
10.0%
fibrosis3145
10.0%
tay-sachs2556
8.1%
diabetes1653
 
5.2%
hemochromatosis1228
 
3.9%
optic587
 
1.9%
Other values (5)1985
6.3%

Most occurring characters

ValueCountFrequency (%)
i26248
 
9.5%
o24599
 
8.9%
s21503
 
7.8%
y19500
 
7.0%
h17716
 
6.4%
a17200
 
6.2%
e17192
 
6.2%
t15729
 
5.7%
r15599
 
5.6%
13560
 
4.9%
Other values (21)88455
31.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter239862
86.5%
Uppercase Letter20603
 
7.4%
Space Separator13560
 
4.9%
Dash Punctuation2556
 
0.9%
Other Punctuation720
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i26248
10.9%
o24599
10.3%
s21503
9.0%
y19500
 
8.1%
h17716
 
7.4%
a17200
 
7.2%
e17192
 
7.2%
t15729
 
6.6%
r15599
 
6.5%
c11578
 
4.8%
Other values (10)52998
22.1%
Uppercase Letter
ValueCountFrequency (%)
L5270
25.6%
M3971
19.3%
C3236
15.7%
T2556
12.4%
S2556
12.4%
D1653
 
8.0%
H1228
 
6.0%
A133
 
0.6%
Other Punctuation
ValueCountFrequency (%)
'720
100.0%
Space Separator
ValueCountFrequency (%)
13560
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2556
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin260465
93.9%
Common16836
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i26248
 
10.1%
o24599
 
9.4%
s21503
 
8.3%
y19500
 
7.5%
h17716
 
6.8%
a17200
 
6.6%
e17192
 
6.6%
t15729
 
6.0%
r15599
 
6.0%
c11578
 
4.4%
Other values (18)73601
28.3%
Common
ValueCountFrequency (%)
13560
80.5%
-2556
 
15.2%
'720
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII277301
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i26248
 
9.5%
o24599
 
8.9%
s21503
 
7.8%
y19500
 
7.0%
h17716
 
6.4%
a17200
 
6.2%
e17192
 
6.2%
t15729
 
5.7%
r15599
 
5.6%
13560
 
4.9%
Other values (21)88455
31.9%

Interactions

2021-09-25T21:36:35.171646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:35.456594image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:35.644028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:35.831484image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:36.003343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:36.159561image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:36.331366image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:36.518853image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:36.706301image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:36.909354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:37.096844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:37.268646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:37.456135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:37.643581image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:37.846668image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:38.049736image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:38.252834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:38.424652image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:38.627701image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:38.799536image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:38.986991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:39.174477image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:39.346281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:39.533737image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:39.705570image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:39.861809image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:40.048028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:40.219862image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:40.411507image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:40.628894image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:40.870249image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:41.050962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:41.238414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:41.465709image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:41.705123image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T21:36:41.972562image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-09-25T21:37:01.480966image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-25T21:37:01.820084image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-25T21:37:02.153513image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-25T21:37:02.548675image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-25T21:37:03.439720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-25T21:36:42.592869image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-25T21:36:45.309294image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-25T21:36:46.577527image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-09-25T21:36:48.291941image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexPatient AgeGenes in mother's sideInherited from fatherMaternal genePaternal geneBlood cell count (mcL)Mother's ageFather's ageStatusRespiratory Rate (breaths/min)Heart Rate (rates/minFollow-upGenderBirth asphyxiaAutopsy shows birth defect (if applicable)Folic acid details (peri-conceptional)H/O serious maternal illnessH/O radiation exposure (x-ray)H/O substance abuseAssisted conception IVF/ARTHistory of anomalies in previous pregnanciesNo. of previous abortionBirth defectsWhite Blood cell count (thousand per microliter)Blood test resultSymptom 1Symptom 2Symptom 3Symptom 4Symptom 5Genetic DisorderDisorder Subclass
002.0YesNoYesNo4.760603NaNNaNAliveNormal (30-60)NormalHighNaNNaNNot applicableNoNaNNoNoNoYesNaNNaN9.857562NaN1.01.01.01.01.0Mitochondrial genetic inheritance disordersLeber's hereditary optic neuropathy
126.0YesNoNoNo4.89329741.022.0AliveNormal (30-60)TachycardiaLowNaNNo recordNot applicableYesNoYesNaNYesYes4.0SingularNaNnormal0.01.01.01.01.0Multifactorial genetic inheritance disordersDiabetes
2312.0YesNoYesNo4.70528021.0NaNDeceasedTachypneaNormalHighMaleNot availableNoNoYes-Not applicableNaNYes1.0Singular7.919321inconclusive0.00.01.00.00.0Mitochondrial genetic inheritance disordersLeigh syndrome
3411.0YesNoNaNYes4.72070332.0NaNAliveTachypneaTachycardiaLowMaleNot availableNot applicableNoYes-Not applicableYesNo4.0Multiple4.098210NaN0.00.00.00.0NaNMultifactorial genetic inheritance disordersCancer
4514.0YesNoYesNo5.103188NaNNaNDeceasedNaNNormalLowFemaleNot availableNoneNoNoNoNoNaNNo0.0Multiple10.272230normal1.00.00.01.00.0Single-gene inheritance diseasesCystic fibrosis
563.0YesNoYesYes4.901080NaN63.0AliveNormal (30-60)NaNLowMaleNo recordNot applicableNaNYesNoNot applicableYesNo3.0Multiple6.825974normal0.00.00.00.00.0Single-gene inheritance diseasesTay-Sachs
673.0NoNoYesYes4.96481640.0NaNAliveTachypneaNormalLowNaNNo recordNot applicableYesYesNo-NoYes1.0Singular9.836352inconclusive0.00.01.0NaN0.0Single-gene inheritance diseasesTay-Sachs
7811.0NoNoYesNo5.20905845.044.0AliveTachypneaTachycardiaLowMaleYesNot applicableYesYesNoNoNoYes0.0Multiple6.669552slightly abnormal1.01.01.00.01.0Mitochondrial genetic inheritance disordersLeigh syndrome
894.0NoYesYesYes4.75227244.042.0AliveTachypneaTachycardiaLowMaleNoNot applicableYesNoNoNoYesYes1.0Multiple6.397702abnormal0.00.01.01.01.0Multifactorial genetic inheritance disordersDiabetes
9106.0YesNoNaNNo4.750824NaNNaNDeceasedTachypneaNaNLowMaleNo recordNoneNoYesYesNot applicableYesNaN1.0Singular5.957321abnormal1.0NaN0.00.0NaNSingle-gene inheritance diseasesHemochromatosis

Last rows

df_indexPatient AgeGenes in mother's sideInherited from fatherMaternal genePaternal geneBlood cell count (mcL)Mother's ageFather's ageStatusRespiratory Rate (breaths/min)Heart Rate (rates/minFollow-upGenderBirth asphyxiaAutopsy shows birth defect (if applicable)Folic acid details (peri-conceptional)H/O serious maternal illnessH/O radiation exposure (x-ray)H/O substance abuseAssisted conception IVF/ARTHistory of anomalies in previous pregnanciesNo. of previous abortionBirth defectsWhite Blood cell count (thousand per microliter)Blood test resultSymptom 1Symptom 2Symptom 3Symptom 4Symptom 5Genetic DisorderDisorder Subclass
18037220710.0YesNoYesYes5.084669NaNNaNDeceasedTachypneaTachycardiaLowNaNNo recordYesYesNoNaN-NoNoNaNSingular9.393462abnormal1.01.00.0NaN1.0Mitochondrial genetic inheritance disordersLeigh syndrome
18038220726.0NoNoNaNNo4.82822434.045.0DeceasedTachypneaNormalHighMaleNot availableYesNoYesYesNaNNoNo4.0Singular5.673544abnormal1.00.00.01.00.0Mitochondrial genetic inheritance disordersMitochondrial myopathy
18039220744.0NoNoNaNNo4.78930735.051.0AliveTachypneaNormalLowMaleYesNot applicableNoNo-NoNoNo3.0MultipleNaNnormal0.00.01.00.00.0Single-gene inheritance diseasesHemochromatosis
180402207510.0NoNoYesYes4.64386049.0NaNDeceasedNaNNormalLowNaNNoYesNoNaNNaNNot applicableYesYes2.0Multiple9.581455abnormal1.00.00.00.0NaNMitochondrial genetic inheritance disordersMitochondrial myopathy
18041220760.0YesNoYesNo4.931758NaN50.0AliveNormal (30-60)TachycardiaLowFemaleNo recordNot applicableNoNoNot applicableNoYesYes1.0Singular11.649052abnormal1.01.00.01.00.0Mitochondrial genetic inheritance disordersLeigh syndrome
18042220779.0NoYesYesYes5.01259947.0NaNDeceasedNaNNormalNaNAmbiguousNo recordYesYesNoNoNot applicableYesYesNaNNaN12.000000slightly abnormalNaN1.00.00.00.0Mitochondrial genetic inheritance disordersLeigh syndrome
18043220784.0YesYesYesNo5.25829835.064.0DeceasedNormal (30-60)TachycardiaHighFemaleNoNoNaNNoNot applicableNoYesNo3.0Multiple6.584811inconclusive0.00.01.00.00.0Mitochondrial genetic inheritance disordersLeigh syndrome
18044220798.0NoYesNoYes4.974220NaN56.0AliveNormal (30-60)NormalHighAmbiguousNoNot applicableYesYesNo-YesNo2.0Multiple7.041556inconclusive1.01.01.01.00.0Multifactorial genetic inheritance disordersDiabetes
18045220808.0YesNoYesNo5.18647035.051.0DeceasedTachypneaNormalHighMaleNoNoneNoNoNaNNoNoNo2.0Singular7.715464normal0.00.00.01.0NaNMitochondrial genetic inheritance disordersMitochondrial myopathy
180462208211.0YesNoNoNo4.73806732.062.0DeceasedNormal (30-60)NormalHighFemaleYesNoneYesYesNot applicableNoYesYes4.0Singular11.188371normal1.00.01.01.01.0Multifactorial genetic inheritance disordersDiabetes